World of Education

home *** CD-ROM | disk | FTP | other *** search

/ World of Education / World of Education.iso / world_n / nonlin20.zip / NONLIN.DOC < prev next >

Wrap

Text File | 1992-12-05 | 99KB | 1,929 lines

N O N L I N Nonlinear Regression Analysis Program A "shareware" program Phillip H. Sherrod Member, Association of Shareware Professionals (ASP) Nonlin allows you to perform statistical regression analyses to estimate the values of parameters for linear, multivariate, polynomial, and general nonlinear functions. The regression analysis determines the values of the parameters which cause the function to best fit the observed data that you provide. Nonlin allows you to specify the function whose parameters are being estimated using ordinary algebraic notation. In addition to determining the parameter estimates, Nonlin can be directed to generate an output file with predicted values and residuals. It can also plot the data observations and the computed function. Although designed for regression analysis, Nonlin can also be used to find the root (zero point) or minimum absolute value of a nonlinear expression. Nonlin is in use at many engineering and research centers around the world. NONLIN -- Nonlinear Regression Program Page 1 INTRODUCTION TO REGRESSION ANALYSIS The goal of regression analysis is to determine the values of parameters for a function that cause the function to best fit a set of data observations that you provide. In linear regression, the function is a linear (straight line) equation. For example, if we assume the value of an automobile decreases by a constant amount each year after its purchase, and for each mile driven, the following linear function would predict its value (the dependent variable) as a function of the two independent variables which are age and miles: value = price + depage*age + depmiles*miles where `value', the dependent variable, is the value of the car, `age' is the age of the car, and `miles' is the number of miles that the car has been driven. The regression analysis performed by Nonlin will determine the best values of the three parameters, `price', the estimated value when age is 0 (i.e., when the car was new), `depage', the depreciation that takes place each year, and `depmiles', the depreciation for each mile driven. The values of `depage' and `depmiles' will be negative because the car loses value as age and miles increase. In a problem such as this car depreciation example, you must provide a data file containing the values of the dependent and independent variables for a set of observations. In this example each observation record would contain three numbers: value, age, and miles, collected from used car ads for the same model car. The more observations you provide, the more accurate will be the estimate of the parameters. The Nonlin commands to perform this regression are shown below: VARIABLES VALUE,AGE,MILES PARAMETERS PRICE,DEPAGE,DEPMILES FUNCTION VALUE = PRICE + DEPAGE*AGE + DEPMILES*MILES DATA (data values go here) Once the values of the parameters are determined by Nonlin, you can use the formula to predict the value of a car based on its age and miles driven. For example, if Nonlin computed a value of 16000 for price, -1000 for depage, and -0.15 for depmiles, then the function value = 16000 - 1000*age - 0.15*miles NONLIN -- Nonlinear Regression Program Page 2 could be used to estimate the value of a car with a known age and number of miles. If a perfect fit existed between the function and the actual data, the actual value of each car in your data file would exactly equal the predicted value. Typically, however, this is not the case, and the difference between the actual value of the dependent variable and its predicted value for a particular observation is the error of the estimate which is known as the "deviation" or "residual". The goal of regression analysis is to determine the values of the parameters which minimize the sum of the squared residual values for the set of observations. This is known as a "least squares" regression fit. INTRODUCTION TO NONLIN Nonlin is a very powerful regression analysis program. Using it you can perform multivariate, linear, polynomial, and general nonlinear regression. What this means is that you specify the form of the function to be fitted to the data, and the function can include nonlinear terms such as variables raised to powers and library functions such as log, exponential, sine, etc. Nonlin uses a state-of-the-art regression algorithm that works as well, or better, than any you are likely to find in commercial statistical packages. As an example of nonlinear regression, consider another depreciation problem. The value of a used airplane decreases for each year of its age. Assuming the value of a plane falls by the same amount each year, a linear function relating value to age is: Value = p0 + p1*Age Where `p0' and `p1' are the parameters whose values are to be determined. However, it is a well known fact that planes (and automobiles) lose more value the first year than the second, and more the second than the third, etc. This means that a linear (straight line) function cannot accurately model this situation. A better, nonlinear, function is: Value = p0 + p1*exp(-p2*Age) Where the `exp' function is the value of e (2.7182818...) raised to a power. This type of function is known as "negative exponential" and is appropriate for modeling a value whose rate of decrease is proportional to the difference between the value and some base value. The F33YEAR.NLR example command file fits a linear function to the value of used airplanes. The F33EXP.NLR example fits a negative exponential function to the same data. Run both examples and compare the fitted functions. See F33.NLR for an example of a multiple regression using three independent NONLIN -- Nonlinear Regression Program Page 3 variables. Much of the convenience of Nonlin comes from the fact that you can enter complicated functions using ordinary algebraic notation. Examples of functions that can be handled with Nonlin include: Linear: Y = p0 + p1*X Quadratic: Y = p0 + p1*X + p2*X^2 Multivariate: Y = p0 + p1*X + p2*Z + p3*X*Z Exponential: Y = p0 + p1*exp(X) Periodic: Y = p0 + p1*sin(p2*X) Misc: Y = p0 + p1*Y + p2*exp(Y) + p3*sin(Z) In other words, the function is a general expression involving one dependent variable (on the left of the equal sign), one or more independent variables, and one or more parameters whose values are to be estimated. Because of its generality, Nonlin can perform all of the regressions handled by ordinary linear or multivariate regression programs as well as nonlinear regression. However, in order to handle nonlinear functions, Nonlin uses an iterative function optimization algorithm which is slower than the simple linear regression algorithm and has the potential for not converging to a solution. INSTALLING NONLIN The NONLIN system consists of the following files: NONLIN.EXE -- The executable program. NONLIN.DOC -- Documentation file. NONLIN.FON -- Font file used if you request a plot. NONLIN.LJF -- HP LaserJet font file used if you print a plot. *.NLR -- Example command files. REGISTER.DOC -- Form used to register your use of Nonlin. To install Nonlin, copy the files into the directory of your choice. If you do not plan to generated hard copy output for a LaserJet printer, you may delete the NONLIN.LJF file. If the NONLIN.FON and NONLIN.LJF files are not in your current directory, you must place a command of the following form in your AUTOEXEC.BAT file to tell Nonlin where to look for its font files: NONLIN -- Nonlinear Regression Program Page 4 SET NONLIN=directory Where "directory" is the name of the device and directory where the files are located. For example, if the files are located in a directory named NONLIN on the C disk, the following command could be used: SET NONLIN=C:\NONLIN USING NONLIN Once Nonlin has been installed, it can be started using a DOS command of the form: NONLIN command_file [listing_file] where "command_file" is the name of a file containing Nonlin commands that control the analysis. The sections that follow describe these commands. If you specify a command file name without an extension, ".NLR" is used as the default extension. If you omit the command file name, Nonlin prints a list of its commands. A "listing_file" parameter may be specified on the command line. If you specify a file name, the output (results) of the regression analysis are written to this file. If no file name is specified, the output is written to a file with the same name as the command_file but with the extension ".LST". If you specify a listing file name without an extension, ".LST" is provided as the default extension. Specify NUL for the listing_file if you do not want to generate an output file. For example, to process a command file named LINEAR.NLR, directing output to a file named LINEAR.LST, use the following command: NONLIN LINEAR To do the same analysis, directing the output to a file named MODEL1.LST, use the following command: NONLIN LINEAR MODEL1 At this point, I suggest you pause in your reading and try running a Nonlin example to get a feel for how it works. Several example files with the extension ".NLR" are provided with the distribution. LINEAR.NLR is a good one to start with. If you do not have a graphics monitor, edit the LINEAR.NLR command file (and other example files) and remove the PLOT command. NONLIN -- Nonlinear Regression Program Page 5 FUNCTION SPECIFICATION Much of the power of Nonlin comes from its ability to estimate the value of parameters that are part of complicated functions that you enter in ordinary algebraic form. This section explains the arithmetic operators and built in functions that are used to specify a function. Arithmetic Operators The following arithmetic operators may be used in expressions: + addition - subtraction or unary minus * multiplication / division ** or ^ exponentiation Exponentiation has the highest precedence, followed by multiplication and division, and then addition and subtraction. Parentheses may be used to group terms. As a convenience, Nonlin allows you to omit the multiplication operator between a numeric constant and a following variable, parameter, or function. For example, the expressions "2pi", and "2 pi" are equivalent to "2*pi". Similarly, "5X" is equivalent to "5*X". However, if you specify a number before the letter "E", it will be taken as the exponential form of a number (see below) rather than the number times the constant e (base of natural logarithms). Numeric Constants Numeric constants may be written in their natural form (1, 0, 1.5, .0003, etc.) or in exponential form, n.nnnEppp, where n.nnn is the base value and ppp is the power of ten by which the base is multiplied. For example, the number 1.5E4 is equivalent to 15000. All numbers are treated as "floating point" values, regardless of whether a decimal point is specified or not. As a convenience for entering time values, if a value contains one or more colons, the portion to the left of the colon is multiplied by 60. For example, 1:00 is equivalent to 60; 1:00:00 is equivalent to 3600. Symbolic Constants You can use the CONSTANT command to associate symbolic names with constant numeric values. When you use the symbolic name in the function the numeric value is substituted for the symbolic name. NONLIN -- Nonlinear Regression Program Page 6 Built-in Constants There are two built-in numeric constants that may be specified using symbolic names. The symbolic name "PI" is equivalent to the value of pi, 3.14159... Similarly, the symbolic constant "E" is equivalent to the base of natural logarithms, e, 2.7182818... You may write PI and E using either upper or lower case. Built in Functions The following functions are built into Nonlin and may be used in expressions: ABS(x) -- Absolute value of x. ACOS(x) -- Arc cosine of x. Angles are measured in radians. ASIN(x) -- Arc sine of x. Angles are measured in radians. ATAN(x) -- Arc tangent of x. Angles are measured in radians. BETAI(x,a,b) -- Incomplete beta function: Ix(a,b). The incomplete beta function can be used to compute a variety of statistical functions. For example, the probability of Student's t with `df' degrees of freedom can be computed with BETAI(df/(df+t^2),.5*df,.5). The probability of the F statistic with df1 and df2 degrees of freedom can be computed with 2*BETAI(df2/(df2+df1*f),.5*df2,.5*df1). COS(x) -- Cosine of x. Angles are measured in radians. COSH(x) -- Hyperbolic cosine of x. COT(x) -- Cotangent of x. (COT(x) = 1/TAN(x)). Angle in radians. CSC(X) -- Cosecant of x. (CSC(x) = 1/SIN(x)). Angle in radians. DEG(x) -- Converts an angle, x, measured in radians to the equivalent number of degrees. EI1(alpha,phi) -- Elliptic integral of the first kind. Computes the integral from 0 to phi radians of the function d.phi/sqrt(1-k**2*sin(phi)**2), where k = sin(alpha). alpha and phi must be in the range 0 to pi/2. EI2(alpha,phi) -- Elliptic integral of the second kind. Computes the integral from 0 to phi radians of the function sqrt(1-k**2*sin(phi)**2)*d.phi, where k = sin(alpha). alpha and phi must be in the range 0 to pi/2. NONLIN -- Nonlinear Regression Program Page 7 EIC1(alpha) -- Complete elliptic integral of the first kind. Computes the integral from 0 to pi/2 radians of the function d.phi/sqrt(1-k**2*sin(phi)**2), where k = sin(alpha). alpha must be in the range 0 to (less than) pi/2. EIC2(alpha) -- Complete elliptic integral of the second kind. Computes the integral from 0 to pi/2 radians of the function sqrt(1-k**2*sin(phi)**2)*d.phi, where k = sin(alpha). alpha must be in the range 0 to pi/2. ERF(x) -- Standard error function of x. EXP(x) -- e (base of natural logarithms) raised to the x power. FAC(x) -- x factorial (x!). Note, the FAC function is computed using the GAMMA function (FAC(x)=GAMMA(x+1)) so non-integer argument values may be computed. GAMMA(x) -- Gamma function. Note, GAMMA(x+1) = x! (x factorial). GAMMAI(x) -- Reciprocal of GAMMA function (GAMMAI(x) = 1/GAMMA(x)). GAMMALN(x) -- Log (base e) of the GAMMA function. HAV(x) -- Haversine of x. (HAV(x) = (1-COS(x))/2). Angle in radians. J0(x) -- Bessel function of the first kind, order zero. J1(x) -- Bessel function of the first kind, order one. JN(n,x) -- Bessel function of the first kind, order n. LOG(x) -- Natural logarithm of x. LOG10(x) -- Base 10 logarithm of x. LOG2(x) -- Base 2 logarithm of x. MAX(x1,x2) -- Maximum value of x1 or x2. MIN(x1,x2) -- Minimum value of x1 or x2. NORMAL(x) -- Normal probability distribution of x. X is in units of standard deviations from the mean. See also the NPD function. NORMAL(x) = NPD(x,0,1); NPD(x,mean,std) -- Normal probability distribution of x with specified mean and standard deviation. X is in units of standard deviations from the mean. NONLIN -- Nonlinear Regression Program Page 8 PAREA(x) -- Area under the normal probability distribution curve from -infinity to x. (i.e., integral from -infinity to x of NORMAL(x)). PULSE(a,x,b) -- Pulse function. If the value of x is less than a or greater than b, the value of the function is 0. If x is greater than or equal to a and less than or equal to b, the value of the function is 1. In other words, it is 1 for the domain (a,b) and zero elsewhere. If you need a function that is zero in the domain (a,b) and 1 elsewhere, use the expression (1-PULSE(a,x,b)). RAD(x) -- Converts an angle measured in degrees to the equivalent number of radians. SEC(x) -- Secant of x. (SEC(x) = 1/COS(x)). Angle in radians. SEL(a1,a2,v1,v2) -- If a1 is less than a2 then the value of the function is v1. If a1 is greater than or equal to a2, then the value of the function is v2. SIN(x) -- Sine of x. Angles are measured in radians. SINH(x) -- Hyperbolic sine of x. SQRT(x) -- Square root of x. STEP(a,x) -- Step function. If x is less than a, the value of the function is 0. If x is greater than or equal to a, the value of the function is 1. If you need a function which is 1 up to a certain value and then 0 beyond that value, use the expression STEP(x,a). See PIECE.NLR for an example of this function. T(n,x) -- Chebyshev polynomial of order n. TAN(x) -- Tangent of x. Angles are measured in radians. TANH(x) -- Hyperbolic tangent of x. Y0(x) -- Bessel function of the second kind, order zero. Y1(x) -- Bessel function of the second kind, order one. YN(n,x) -- Bessel function of the second kind, order n. NONLIN -- Nonlinear Regression Program Page 9 NONLIN COMMAND FILES The commands described in this section are placed in a command file. When you start Nonlin, you specify the name of the command file as a parameter on the command line. For example, if the command file name is CAR.NLR, the following command would cause Nonlin to execute the commands in the command file: NONLIN CAR.NLR If you do not specify a file name extension for the command file, ".NLR" is used by default. The output of the regression for this example would be written to a file named CAR.LST. Command files can be created using a text editor such as EDIT-32, EDLIN, the DOS version 5 EDIT program, or any other editor or word processor that is capable of creating an ascii text file without formatting codes. Comments may be placed in command files by preceding the comment with an exclamation point. Entire lines may be used for comments and comments can be placed at the end of commands. Command lines can be continued by placing a semicolon character as the last non-blank character on the line (a comment may follow the semicolon) and then continuing the command on the following line(s). Every command file must contain the following commands: VARIABLES, PARAMETERS, FUNCTION, and DATA. The DATA statement introduces the data for the analysis and must be the last command in the file (data records may follow it). Other, optional, commands may be interspersed in the command file. The following is an example of a complete command file: VARIABLES VALUE,AGE,MILES PARAMETERS BASE,DEPAGE,DEPMILES FUNCTION VALUE = BASE + DEPAGE*AGE + DEPMILES*MILES DATA (data records follow) NONLIN COMMANDS The following is a list of the valid Nonlin commands that can be placed in a Nonlin command file. Command keywords may be abbreviated to the first three letters except for CONSTANT, CONSTRAIN, and CONFIDENCE which require six letters. Nonlin commands are not case sensitive. NONLIN -- Nonlinear Regression Program Page 10 TITLE string (optional) -- Specifies a title line that is printed with the results of the analysis. VARIABLES var1,var2,... (required) -- Specifies the names of the variables that will be used in the function. The dependent variable and the independent variables must be specified. The order of the variable names must match the order of the data values on each observation record (the dependent variable may come before or after the independent variables). You may define more variables than you actually use in the function specification. A maximum of 12 variables may be specified. The length of a variable name is limited to 10 characters. Capitalize the variable names as you want them displayed in the results. You may specify all of the variables on a single command line (which may be continued), or you may use multiple VARIABLES commands. If you use multiple commands, the order in which they appear in the command file must match the order of the variable values on each observation record. The VARIABLES command must precede the FUNCTION command. See F33.NLR for an example of a multiple regression using three independent variables. PARAMETERS param1[=initial1],param2[=initial2],... (required) -- Specifies the names of the parameters whose values are to be determined by Nonlin. Nonlin is capable of handling up to 12 parameters. The parameter names may not exceed 10 characters in length. Do not specify any parameters that are not used in the function. The PARAMETERS command must precede the FUNCTION command. Optionally, an initial estimate of the parameter value may be specified by following the parameter name with an equal sign and the value. If no value is specified, 1 is used by default. Specifying an initial value that is near the actual value usually speeds up the operation of Nonlin and may enable it to successfully converge to a solution. If Nonlin is unable to converge to a solution, try specifying different starting values for the parameters. Try to specify a value that at least has the correct sign as the expected final value. The CONSTRAIN command (described below) can be used to limit the range of values for parameters. The SWEEP command can be used to perform the regression analysis with a range of parameter initial values. The CONSTANT command can be used to define a parameter with a fixed value. NONLIN -- Nonlinear Regression Program Page 11 CONFIDENCE [percent] (optional) -- Specifies that a confidence interval is to be printed for each estimated parameter. The purpose of regression analysis is to determine the best estimate of parameter values. However, as with most statistical calculations, the values determined are estimates of the true values. The CONFIDENCE command causes Nonlin to print a table showing the range of possible values for each parameter given a specified confidence value. The "percent" parameter spcifies the probability that that the actual value of the parameter is within the confidence interval to be computed. For example, the command CONFIDENCE 95 specifies that the confidence interval(s) are to be computed such that there is a 95 percent probability that the actual values of the parameters are within the intervals (or that there is a 5 percent chance that the parameters are outside the intervals). The "percent" parameter may range from 50 to 99.999. If the CONFIDENCE command is used without specifiying a percent value, 90 is used by default. CONSTANT parameter=value (optional) -- Specifies the name of a symbolic constant and associates a numeric value. You may then use the symbolic name in the function and the corresponding constant numeric value will be substituted. This is useful when you are trying out different models and want to easily be able to change a constant value for each run. The CONSTANT commands must precede the FUNCTION command. The following is an example of a symbolic constant named "Roomtemp" that causes the value 73 to be substituted in the function: Variable Time ! Cooling time in seconds Variable Temp ! Temperature of object Constant Roomtemp = 73 ! Ambient temperature Parameter InitTemp ! Initial temperature Parameter Coolrate ! Cooling rate factor Function Temp = Roomtemp + InitTemp * exp(-Coolrate * Time) CONSTRAIN parameter=lowvalue,highvalue (optional) -- Specifies a lower and upper limit on the range of a parameter value. During the solution process, Nonlin may allow a parameter's value to temporarily move in a direction away from its final value. With some functions it may be necessary to constrain the parameter's value so that it does not go negative (e.g., if the function takes the square root of the parameter), or zero (if the parameter is in a denominator). If a parameter is tightly constrained, Nonlin may report "singular convergence" because it is unable to converge to an optimum value of the parameter; however, the estimated values of other parameters may be useful. NONLIN -- Nonlinear Regression Program Page 12 Only a single parameter and its associated limits may be specified on each CONSTRAIN command, but you may use multiple CONSTRAIN commands. The PARAMETERS command must precede the CONSTRAIN command. Use the CONSTANT command if you wish to define a parameter with a fixed value. The parameter value is allowed to range from `lowvalue' to `highvalue'. If you want to prevent a parameter value from going to zero, you must specify a value greater than zero for the low value (specifying zero would allow it to reach, but not go below, zero). For example, the following command constrains the value of `age' to be greater than zero and less than or equal to 100: CONSTRAIN age = .0001,100 See the COOLING.NLR, F33EXP.NLR, and POWER.NLR files for examples of the CONSTRAIN command. COVARIANCE (optional) -- Causes the variance-covariance matrix for the parameters to be printed. SWEEP parameter=lowvalue,highvalue,stepsize (optional) -- Specifies that the regression analysis is to be performed repeatedly with a set of starting values for the parameter. The first analysis is performed with the parameter having the `lowvalue'; the value of `stepsize' is then added to the parameter's initial value and the analysis is performed again. The process is repeated until the value of the parameter reaches `highvalue'. Each time the analysis is performed the value of the residual sum of squares is compared with the best previous result. The estimated values of the parameters for the best starting value are saved and used for the final analysis and report. Only one parameter may be specified on each SWEEP command, but you may have as many SWEEP commands as there are parameters. The number of regression analyses performed will be equal to the product of the number of parameter values for each SWEEP command. The SWEEP command is useful when you are trying to fit a complicated function that may have "local minimum" values other than the "global minimum". Periodic functions (sin, cos, etc.) are especially troublesome. See the SINE.NLR command file for an example of the SWEEP command. NONLIN -- Nonlinear Regression Program Page 13 FUNCTION depvar = function (required) -- Specifies the form of the function whose parameters are to be determined. The dependent variable must be the only thing to the left of the equal sign. The expression to the right of the equal sign may contain variables, parameters, constants, operators, and library functions such as sqrt, sin, exp, etc. The VARIABLES and PARAMETERS commands must have appeared in the command file before the FUNCTION command, and all variables and parameters used in the function must have been specified on those commands. Some example FUNCTION commands are show below: FUNCTION Y = P0 + P1*X FUNCTION DISTANCE = .5 * ACCEL * TIME^2 FUNCTION VALUE = PRICE + YRDEP*AGE + MILEDEP*MILES FUNCTION POPULATN = BASE * GROWRATE * EXP(TIME) TOLERANCE value (optional, default=1E-10) -- Specifies the tolerance factor that is used to determine when the algorithm has converged to a solution. Reducing the tolerance value may produce a slightly more accurate result but will increase the number of iterations and the running time. The tolerance value must be in the range 1E-15 to 1E-1. ITERATIONS value (optional, default=50) -- Specifies the maximum number of iterations that should be attempted by the algorithm. If the solution does not converge to the limit specified by the TOLERANCE command (or to the default tolerance) before the maximum number of iterations is reached, the process is stopped and the results are printed. Failure to converge before the specified number of iterations could be caused by one of three things: 1. The maximum allowed number of iterations may be too small. Try using an ITERATIONS command with a larger value. 2. The tolerance factor may be too small. Even a properly converging solution will eventually "level off" or oscillate around a good, but non-zero, sum of squares value. Try using the TOLERANCE command to increase the tolerance value. 3. The function may not be converging. Try specifying better (or at least different) starting values for the parameters on the PARAMETERS command. Consider using the SWEEP command to specify a range of parameter starting values. NONLIN -- Nonlinear Regression Program Page 14 REGISTER (optional) -- The REGISTER command suppresses the copyright and registration message that is normally printed as part of a Nonlin report. The use of this command is a reminder that you should register your use of Nonlin. Note, if you find Nonlin to be useful, educational, or entertaining you are expected to register your use so that the author can be justly compensated and that development of the program can continue. Use the form in REGISTER.DOC to register your use. OUTPUT [TO file] var1,var2,... (optional) -- Specifies that after the analysis is completed, data values are to be printed or written to a file. If the "TO file" portion of the command is specified, the output is written to the specified file. If this portion of the command is omitted, the output values are printed along with the results. If a file name is specified without an extension, ".OUT" is used by default. The list of variable names determines which variables are written to the file and the order in which the values appear in each output record. Any variable previously declared on a VARIABLES command may be specified. In addition, the folowing special variable names may appear in the output list: $OBS -- The observation record number, starting at 1 and increasing by 1. $PREDICTED -- The predicted value for the dependent variable for the observation, given the independent variable values and the parameters as calculated by the analysis. $RESIDUAL -- The difference between the actual value of the dependent variable and its predicted value. Examples of OUTPUT commands are shown below: OUTPUT AGE,MILES,VALUE,$PREDICTED,$RESIDUAL OUTPUT TO GROWTH.DAT $OBS,TIME,POPULATN,$PREDICTED POUTPUT file (optional) -- The POUTPUT command specifies that Nonlin is to write the final estimated values of the parameters to a file. Each parameter value is written to a separate line of the file. This command is useful to create a file of estimated parameter values to be fed into another analysis program. PLOT [options] (optional) -- Display a plot of the calculated function and the data observations. The PLOT command can only handle a single independent variable (multiple independent variables would require an n-dimensional surface plot); however, there is no restriction on the number of NONLIN -- Nonlinear Regression Program Page 15 parameters being estimated. You must have a CGA, EGA, or VGA monitor to use the PLOT command, and the NONLIN.FON font file must be in the current directory or in a directory specified by the NONLIN environment variable. In the plot, the data values you provided are shown as blue X's and the function fitted to the data by Nonlin is shown as a solid green line. Press Return to proceed with the analysis after you have finished looking at the plot. The following options may be specified on the PLOT command: GRID -- display grid lines to make it easier to estimate values. RESIDUAL -- draw vertical lines from each observed data point to the corresponding point on the calculated function line. These lines represent the "residual" value that Nonlin is attempting to minimize. See also the descriptions of the RPLOT and NPLOT commands. ITERATION -- draw a plot for each iteration of the regression analysis. Normally, the plot is drawn after the analysis has converged to a solution; you may use the ITERATION option to observe the function during each iteration of the analysis as it converges to fit the data. VALUES -- use in conjunction with the ITERATION option to cause the current parameter values to be displayed before the plot for the current iteration. PRINT -- print a copy of the plot on an HP LaserJet printer. Nonlin writes the plot to the PRN device which much be attached to an HP Series II or Series III printer. The NONLIN.LJF font file must be in the current directory or in a directory specified by the NONLIN environment variable. NOPAUSE -- do not pause after the plot is displayed. Normally, Nonlin pauses after displaying a plot to allow you time to examine it; you press Enter to continue execution once you have finished looking at the plot. The NOPAUSE option causes Nonlin to continue with execution without pausing after the plot is displayed. This is useful in conjunction with the PRINT option when Nonlin is run in a batch file and you want to generate a hardcopy plot but not pause after the screen display. The option keywords may be abbrievated to their first letter. If more than one option is specified, separate them with commas. For example, to produce a plot with both grid lines and residual lines, use the following command: NONLIN -- Nonlinear Regression Program Page 16 PLOT GRID,RESIDUAL RPLOT [options] (optional) -- Display a plot of the residual values. A "residual" value (or error deviation) is the difference between an actual value of the dependent variable for an observation and the predicted value based on the function fitted by the regression analysis. If the calculated function exactly predicted the actual observation values, all of the residual values would be zero. However, this is usually not the case and the residual values show where, and by how much, the fitted function fails to predict the actual observations. The RPLOT command causes Nonlin to display a plot showing the residual values on the vertical (Y) axis and the independent variable values on the horizontal (X) axis. However, if there is more than one independent variable, Nonlin displays the residual values on the vertical (Y) axis and the dependent variable values on the horizontal (X) axis. The plot title indicates if the dependent variable was used for the X axis. A residual plot is very useful for determining if the form of the function being fitted is appropriate for the data values. If the residual values are randomly distributed in positive and negative directions then the form (shape) of the fitted function is probably appropriate for the data and the deviations are due to random measurement errors. If, however, the residuals show a systematic pattern such as a periodic cycle, then the function may not be appropriate for the data values. See the discussion of the Durbin-Watson statistic for additional information about autocorrelated residual values. The PLOT, RPLOT, and NPLOT commands may be used in the same command file. Press Return to proceed with the analysis after you have finished looking at the plot. The following options may be specified on the RPLOT command: GRID -- display grid lines to make it easier to estimate values. PRINT -- print a copy of the plot on an HP LaserJet printer. Nonlin writes the plot to the PRN device which much be attached to an HP Series II or Series III printer. NOPAUSE -- do not pause after the plot is displayed. Normally, Nonlin pauses after displaying a plot to allow you time to examine it; you press Enter to continue execution once you have finished looking at the plot. The NOPAUSE option causes Nonlin to continue with execution without pausing after the plot is displayed. NONLIN -- Nonlinear Regression Program Page 17 The option keywords may be abbrievated to their first letter. If more than one option is specified, separate them with commas. NPLOT [options] (optional) -- Display a normal probability plot of the residual values. In this plot, the actual value of each residual is plotted on the vertical (Y) axis and the expected value of the residual, assuming the residuals are normally distributed, is plotted on the horizontal (X) axis. If the residuals are normally distributed, the resulting plot will be a straight line passing through the origin with a slope of 1 (i.e., the actual value of each residual should equal the expected value from the normal distribution). If the residuals are not normally distributed, the plot will deviate from a straight line. This plot also computes the correlation between the actual residual values and their expected values and displays the correlation coefficient in the title line "(r=n.nn)". If the residual values are normally distributed, the correlation should be close to 1.00. A correlation value less than 0.94 suggests that the residuals are not normally distributed. The NPLOT command may be used even if there is more than one independent variable. The PLOT, RPLOT, and NPLOT commands may be used in the same command file. Press Return to proceed with the analysis after you have finished looking at the plot. The following options may be specified on the NPLOT command: GRID -- display grid lines to make it easier to estimate values. PRINT -- print a copy of the plot on an HP LaserJet printer. Nonlin writes the plot to the PRN device which much be attached to an HP Series II or Series III printer. NOPAUSE -- do not pause after the plot is displayed. The option keywords may be abbrievated to their first letter. If more than one option is specified, separate them with commas. DOMAIN lowvalue,highvalue (optional) -- Specifies the domain over which the plot is to be generated. If the DOMAIN statement is omitted, the domain of the independent variable is used for the plot. The DOMAIN statement can be used to generate a plot of the fitted function extrapolated over the specified domain. You can also use the DOMAIN command to restrict the domain and "zero in" on a particular range of the function. The DOMAIN command only affects the PLOT NONLIN -- Nonlinear Regression Program Page 18 command; it does not affect the regresson calculation or the RPLOT or NPLOT commands. PRESOLUTION value (optional) -- Specifies whether plots sent to HP LaserJet printers should use 150 or 300 dot-per-inch resolution. The value parameter must be 150 or 300. The default value is 150 causes the plots to use most of the horizontal width of an 8.5x11 inch page. These plots are suitable for direct transfer to overhead transparencies. Specifying 300 for the resolution produces smaller plots that are suitable for inclusion in printed documents. WIDTH value (optional) -- Specify the width, in inches, of printed plots. Due to memory space considerations, the maximum width is limited to about 7.9 inches for 150 DPI resolution and 4.5 inches for 300 DPI resolution. If you have limited memory space, you may have to reduce the width to be able to produce printed plots. This statement is ignored unless you request that a plot be printed. DATA [file] (required) -- Specifies the name of the file containing the data records, or introduces the data records which follow the command. If a file name is specified on the DATA command, the file is opened, its data records are read, and the regression analysis is performed. If a file name is specified without an extension, ".DAT" is used by default. If no file name is specified on the DATA command, the data records must immediately follow the DATA command in the command file. Each data record must contain at least as many data values as the number of variables specified on the VARIABLES command(s). The order of the variables as specified on the VARIABLES command must match the order of the values in each observation. Any data values beyond those required for the specified variables are ignored. Each observation must begin on a new line. The data values must be separated by one or more spaces and/or a comma. Data values may contain decimal points and may be expressed in exponential notation (i.e., n.nnnnEppp). As a convenience for entering time values, if a value contains one or more colons, the portion to the left of the colon is multiplied by 60. For example, 1:00 is equivalent to 60; 1:00:00 is equivalent to 3600. You may continue data lines by specifying a semicolon as the last non-blank character on a record and then placing the continuation value on the following line(s). NONLIN -- Nonlinear Regression Program Page 19 The DATA command must be the last command in the command file. If no file name is specified on the DATA command, the data records must immediately follow the DATA command in the command file. The following is an example of a complete command file including data records: VARIABLES AGE,MILES,VALUE PARAMETERS BASE,DEPAGE,DEPMILES FUNCTION VALUE = BASE + DEPAGE*AGE + DEPMILES*MILES DATA 2 10000 13000 4 42000 9000 1 7000 17000 6 52000 6000 5 48000 8000 If the data records had been placed in a separate file named CAR.DAT, the DATA statement would be changed to "DATA CAR.DAT". UNDERSTANDING THE RESULTS Nonlin prints a variety of statistics at the end of each analysis. For each variable, Nonlin lists the minimum value, the maximum value, the mean value, and the standard deviation. You should confirm that these values are within the ranges you expect. For each parameter, Nonlin displays the initial parameter estimate (which you specified on the PARAMETER command, or 1 by default), the final (maximum likelihood) estimate, the standard error of the estimated parameter value, the "t" statistic comparing the estimated parameter value with zero, and the significance of the t statistic. The final estimate parameter values are the results of the analysis. By substituting these values in the equation you specified to be fitted to the data, you will have a function that can be used to predict the value of the dependent variable based on a set of values for the independent variables. For example, if the equation being fitted is y = p0 + p1*x and the final estimates are 1.5 for p0 and 3 for p1, then the equation y = 1.5 + 3*x NONLIN -- Nonlinear Regression Program Page 20 is the best equation of this form that will predict the value of y based on the value of x. The "t" statistic is computed by dividing the estimated value of the parameter by its standard error. This statistic is a measure of the likelihood that the actual value of the parameter is not zero. The larger the absolute value of t, the less likely that the actual value of the parameter could be zero. The "Prob(t)" value is the probability of obtaining the estimated value of the parameter if the actual parameter value is zero. The smaller the value of Prob(t), the more significant the parameter and the less likely that the actual parameter value is zero. For example, assume the estimated value of a parameter is 1.0 and its standard error is 0.7. Then the t value would be 1.43 (1.0/0.7). If the computed Prob(t) value was 0.05 then this indicates that there is only a 0.05 (5%) chance that the actual value of the parameter could be zero. If Prob(t) was 0.001 this indicates there is only 1 chance in 1000 that the parameter could be zero. If Prob(t) was 0.92 this indicates that there is a 92% probability that the actual value of the parameter could be zero; this implies that the term of the regression equation containing the parameter can be eliminated without significantly affecting the accuracy of the regression. The t statistic probability is computed using a two-sided test. The CONFIDENCE command can be used to cause Nonlin to print confidence intervals for parameter values. The SQUARE.NLR example regression includes an extraneous parameter (p0) whose estimated value is much smaller than its standard error; the Prob(t) value is 0.99982 indicating that there is a high probability that the value is zero. In addition to the variable and parameter values, Nonlin displays several statistics that indicate how well the equation fits the data. The "Final sum of squared deviations" is the sum of the squared differences between the actual value of the dependent variable for each observation and the value predicted by the function, using the final parameter estimates. The "Average deviation" is the average over all observations of the absolute value of the difference between the actual value of the dependent variable and its predicted value. The "Maximum deviation for any observation" is the maximum difference (ignoring sign) between the actual and predicted value of the dependent variable for any observation. The "Proportion of variance explained (R^2)" indicates how much better the function predicts the dependent variable than just using the mean value of the dependent variable. This is also known as the "coefficient of multiple determination." It is computed as follows: Suppose that we did not fit an equation to the data and ignored all information about the independent variables in each observation. Then, the best prediction for the NONLIN -- Nonlinear Regression Program Page 21 dependent variable value for any observation would be the mean value of the dependent variable over all observations. The "variance" is the sum of the squared differences between the mean value and the value of the dependent variable for each observation. Now, if we use our fitted function to predict the value of the dependent variable, rather than using the mean value, a second kind of variance can be computed by taking the sum of the squared difference between the value of the dependent variable predicted by the function and the actual value. Hopefully, the variance computed by using the values predicted by the function is better (i.e., a smaller value) than the variance computed using the mean value. The "Proportion of variance explained" is computed as 1 - (variance using predicted value / variance using mean). If the function perfectly predicts the observed data, the value of this statistic will be 1.00 (100%). If the function does no better a job of predicting the dependent variable than using the mean, the value will be 0.00. The "adjusted coefficient of multiple determination (Ra^2)" is an R^2 statistic adjusted for the number of parameters in the equation and the number of data observations. It is a more conservative estimate of the percent of variance explained, especially when the sample size is small compared to the number of parameters. It is computed using the formula: Ra^2 = 1 - (n-1)/(n-p) * (1-R^2) where `n' is the number of observations, `p' is the number of parameters, and `R^2' is the unadjusted coefficient of multiple determination. The "Durbin-Watson test for autocorrelation" is a statistic that indicates the likelihood that the deviation (error) values for the regression have a first-order autoregression component. The regression models assume that the error deviations are uncorrelated. In business and economics, many regression applications involve time series data. If a non-periodic function, such as a straight line, is fitted to periodic data the deviations have a periodic form and are positively correlated over time; these deviations are said to be "autocorrelated" or "serially correlated." Autocorrelated deviations may also indicate that the form (shape) of the function being fitted is inappropriate for the data values (e.g., a linear equation fitted to quadratic data). If the deviations are autocorrelated, there may be a number of consequences for the computed results: 1) The estimated regression coefficients no longer have the minimum variance property; 2) the mean square error (MSE) may seriously underestimate the variance of the error terms; 3) the computed standard error of the estimated parameter values may underestimate the true standard error, in which case the t values and confidence intervals may be incorrect. Note that if an NONLIN -- Nonlinear Regression Program Page 22 appropriate periodic function is fitted to periodic data, the deviations from the regression will be uncorrelated because the cycle of the data values is accounted for by the fitted function. Small values of the Durbin-Watson statistic indicate the presence of autocorrelation. Consult significance tables in a good statistics book for exact interpretations; however, a value less than 0.80 usually indicates that autocorrelation is likely. If the Durbin-Watson statistic indicates that the residual values are autocorrelated, it is recommended that you use the RPLOT and/or NPLOT commands to display a plot of the residual values. If an NPLOT command is used to produce a normal probability plot of the residuals, the correlation between the residuals and their expected values (assuming they are normally distributed) is printed in the listing. If the residuals are normally distributed, the correlation should be close to 1.00. A correlation less than 0.94 suggests that the residuals are not normally distributed. An "Analysis of Variance" table provides statistics about the overall significance of the model being fitted. THEORY OF OPERATION Nonlin uses a model/trust-region technique along with an adaptive choice of the model Hessian. The algorithm is essentially a combination of Gauss-Newton and Levenberg-Marquardt methods; however, the adaptive algorithm often works much better than either of these methods alone. The basis for the minimization technique used by Nonlin is to compute the sum of the squared residuals for one set of parameter values and then slightly alter each parameter value and recompute the sum of squared residuals to see how the parameter value change affects the sum of the squared residuals. By dividing the difference between the original and new sum of squared residual values by the amount the parameter was altered, Nonlin is able to determine the approximate partial derivative with respect to the parameter. This partial derivative is used by Nonlin to decide how to alter the value of the parameter for the next iteration. If the function being modeled is well behaved, and the starting value for the parameter is not too far from the optimum value, the procedure will eventually converge to the best estimate for the parameter. This procedure is carried out simultaneously for all parameters and is, in fact, a minimization problem in n-dimensional space, where `n' is the number of parameters. For a much more detailed explanation of the regression algorithm used by Nonlin see ACM Transactions on Mathematical Software 7,3 (Sept. 1981) "Dennis, J.E., Gay, D.M., and Welsch, R.E. -- An NONLIN -- Nonlinear Regression Program Page 23 adaptive nonlinear least-squares algorithm." CONVERGENCE CRITERION Nonlin has several convergence criterion that stop the iterative minimization procedure. The TOLERANCE command can be used to alter the convergence tolerance value. Two internal variables are used to determine when convergence has occurred. RFCTOL has a default value of 1E-10 and can be altered by use of the TOLERANCE command. AFCTOL has a default value of 1E-20 and is only altered by the TOLERANCE command if the value specified is less than the default value. In the discussion which follows the "function value" is half the sum of the squared residuals computed using the current parameter estimates. "Relative function convergence" is reported if the predicted maximum possible function reduction is at most RFCTOL*ABS(F0) where F0 is the function value at the start of the current iteration, and if the last step attempted achieved no more than twice the predicted function decrease. "Absolute function convergence" is reported if the function value is less than AFCTOL. HINTS FOR NONLIN USE Convergence Failures One of the potential problems that confronts any nonlinear minimization procedure is non-convergence. Non-convergence is usually not a problem for regressions using a linear model, but becomes a more serious consideration when using complicated nonlinear functions; increasing the number of parameters aggravates the problem. Non-convergence can occur in two ways: the solution may diverge or it may converge to the wrong solution -- a local minimum rather than the global minimum. Periodic functions, such as sin, and cos, are particularly prone to convergence problems. For example, consider a nonlinear regression performed with the function: y = offset + amplitude * sin(frequency * x) where x and y are variables, and offset, amplitude, and frequency are the parameters whose values are to be determined. If the starting value for frequency is not reasonably close to the correct value, the solution may converge to a harmonic (multiple) or subharmonic (fundamental) value of the frequency. A command file named SINE.NLR is supplied with the commands and data to NONLIN -- Nonlinear Regression Program Page 24 perform this analysis. The SWEEP command can be very useful in cases like the sine example. In the SINE.NLR example analysis, the actual value of the frequency is 3; the function converges to the correct solution if the starting value is in the range 2.6 to 3.3. However, this example is quite insensitive to the starting value of the amplitude parameter. With an actual value of 2, the correct solution is found with starting values from 1 through 10000. Similarly, the offset parameter, which had an actual value of 10, was successfully determined with starting values ranging from 1 to over 50000. Another example which is sensitive to a parameter starting value is POWER.NLR which attempts to determine the values of the parameters p0, p1, and p2 for the function y = p0 + p1*x^p2 (where "x^p2" means x raised to the p2 power). The actual value of p2 in the example data is 2; the solution converges correctly if the starting value of p2 is in the range 1.8 to 3.8. As with the other example, the solution is relatively insensitive to the starting values of p0 and p1. Singular Matrix Problems Another possible problem is that the analysis may stop with the message "Singular convergence. Mutually dependent parameters?". This is usually due to one of two things: (1) a redundant parameter that is co-dependent with another parameter, or (2) a situation where the value of one parameter "blocks" the effect of other parameters. As an example of a redundant parameter, consider the function y = p0 + p1*p2*x This is a simple linear equation except there are two parameters, p1, and p2, which are both factors to the variable x. It should be clear that there is no unique solution to this problem since any value of p1 is possible if the right value of p2 is chosen. Similarly, the function y = p0 + p1 + p2*x has no unique solution since either p0 or p1 is redundant. Similarly, in the equation y = p0 + p1*exp(x+p2) NONLIN -- Nonlinear Regression Program Page 25 either p1 or p2 is redundant. The second type of singular matrix problem can be illustrated by the function y = p0 + p1*x^p2 If, during the solution process, p1 takes on the value 0, then varying the value of p2 has no effect on the equation and Nonlin cannot figure out which way to change the value of p2 to move toward convergence. The solution to this problem is to assign a starting value that is not zero to p1, and use the CONSTRAIN command to force p1 to remain non-zero. PERFORMANCE ISSUES Nonlin is carefully programmed and compiled with an optimizing compiler for maximum performance. However, Nonlin is a real "number cruncher," and the nonlinear regression algorithm is mathematically very elaborate. During each iteration, Nonlin computes gradients, Jacobians, Hessians, and eigenvalues, and performs QR and Cholesky matrix decompositions. All calculations are carried out using double precision (64 bit) floating point. Nonlin does not require an 80x87 numeric coprocessor, but its performance is greatly enhanced if one is present. In fact, an 8088 CPU with an 8087 numeric coprocessor can perform regression analyses faster than a 20 MHz 80386 that does not have a coprocessor. If you have an 8088 without a coprocessor, be patient -- Nonlin is probably giving it the workout of its life. Very long running times can result if you use the SWEEP command with many starting values. The problem is compounded if you have multiple SWEEP commands. If you use the SWEEP command to try a large number of starting parameter values, you can save time by using the ITERATIONS command to specify a small number of iterations (such as 5) during the initial attempt to find a solution. Once a feasible set of starting parameter values has been determined, remove the SWEEP command, specify the starting values on the PARAMETERS command, increase the number of iterations, and rerun the analysis to get the final result. PROGRAM LIMITS The following is a summary of the Nonlin program limitations: Maximum number of variables = 12 Maximum number of parameters = 12 Maximum length of variable or parameter names = 10 The maximum number of data observations that Nonlin can handle depends on the number of parameters as shown by the table that NONLIN -- Nonlinear Regression Program Page 26 follows: # Parameters Max Observations 1 2019 2 1611 3 1339 4 1144 5 997 6 883 7 791 8 715 9 652 10 599 EXAMPLE ANALYSES A number of example regression analysis files are provided with your Nonlin distribution. All of the example command files have the extension ".NLR". Some of the important ones are described below, others contain comment lines that explain what they do. LINEAR.NLR -- Simple linear regression with plotted function and data. QUAD.NLR -- Fit a quadratic equation. Plot the function and the data. ASYMPTOT.NLR -- Fit an asymptotic function Y = 12 - 10/X. F33.NLR -- Multivariate linear regression (multiple regression). Calculate the value of a used Beech F33 Bonanza airplane using a linear model based on its age, the number of hours on its airframe, and the number of hours on its engine. The t value and Prob(t) indicate that the number of hours on the engine (`Engdep' parameter) is not signficant to the regression model; the other parameters are significant but airframe hours is less significant than the base price and age of the plane. F33YEAR.NLR -- Similar to F33.NLR except the price of the Bonanza is calculated based on a linear function of only the age. F33EXP.NLR -- Similar to F33YEAR.NLR except a negative exponential function is used rather than a linear function. Compare the fit of this model with that of the F33YEAR.NLR example. SINE.NLR -- Fit an equation involving a sin function. The SWEEP command is used to find a starting point that will converge. SQUARE.NLR -- Fit a sine series to a square wave. Note in this example that the 'p0' parameter, which represents the NONLIN -- Nonlinear Regression Program Page 27 constant term of the equation, has an estimated value of 9.22715E-006 (very nearly zero) and a standard error of 0.0398754. This yields a t value of nearly zero and Prob(t) of 0.99982 which means that there is a 99.982% chance that the actual value of p0 may be zero (it is in fact zero). This illustrates how you can use the t value and Prob(t) to identify extraneous parameters. COOLING.NLR -- Fit an equation involving an exponential function. If a heated object is allowed to cool, the rate of cooling at any instant is proportional to the difference between the object's temperature and the ambient (room) temperature. In other words, an object cools faster at first, while it is hot, and the rate of cooling slows down as the temperature of the object approaches the ambient temperature. The function that relates the object's temperature to time is: Temperature = Roomtemp+InitTemp*exp(-Coolrate*Time) Where InitTemp is the number of degrees above room temperature at time 0, and Coolrate is a factor that depends on the mass of the object, how well it is insulated, etc. The exp function is the value of e (2.7182818...) raised to a power. The COOLING.NLR example determines the parameters InitTemp and Coolrate to fit an equation of this form to some data the author collected. BOIL.NLR -- The boiling point of water decreases as the pressure in the vessel containing the water decreases. "Clapeyron's equation" shows that the boiling point is related to pressure according to the following function: Temperature = b / log(Pressure/a) - 459.7 Where `Temperature' is in degrees Fahrenheit (the 459.7 constant converts degrees Fahrenheit to degrees Rankine -- relative to absolute zero), `Pressure' is the pressure in the vessel in pounds per square inch, and `a' and `b' are parameters whose values are to be determined. The data for this example was collected by the author's son for a science project. MAGNET.NLR -- Fit a function involving an arc tangent and a variable to the third power. This is an interesting physics problem. If a magnet is placed due east of a compass, the deflection of the compass needle from north is equal to the arc tangent of the ratio of the strength of the magnet's field relative to the earth's magnetic field. The strength of the magnet's field at the compass is inversely proportional to the cube of the distance from the magnet to the compass. Thus, the function relating these terms is NONLIN -- Nonlinear Regression Program Page 28 Deflection = deg(atan(Strength / Distance ^ 3)) The deg function converts an angle in radians to degrees. In the example, Deflection and Distance are the variables, and the value of the Strength parameter is determined. DIODE.NLR -- The current through a diode increases sharply as the voltage across the diode is increased. An equation that approximates the current flow as a function of the voltage is: I = exp(b*(V-c)) where `I' is the current, `V' is the voltage, and `b', and `c' are parameters that are to be estimated by the nonlinear regression. AVLTIME.NLR -- An AVL tree is a balanced binary tree used to store information in a computer's memory. Because the entries in an AVL tree are kept in sorted order, and the tree is kept in a balanced form, it is possible to rapidly find any entry in the tree. The time required to create an AVL tree with N entries is approximately equal to: Time = a + b*N*log2(N) where `a' is a constant term equal to the overhead involved in starting and completing a tree creation, and `b' is a growth coefficient that depends on the speed of the computer. The log2(N) function is the log base 2 of N (the number of entries). The AVLTIME.NLR example fits an equation to a data set that relates the time in seconds required to create an AVL tree with the number of entries in the tree. PIECE.NLR -- Piecewise linear function. Fit a function consisting of two linear pieces that bend at X=5. When X is less than 5, the slope of the function is B1. When X is greater than or equal to 5, the slope is B2. B0 is the Y value of the function at X=5 (i.e., at the pivot point). The step(a,x) function returns the value 0 when x is less than `a'; it returns 1 when x is greater than or equal to `a'. NONLIN USE FOR ROOT FINDING AND FUNCTION MINIMIZATION Although it is designed for nonlinear regression analysis, Nonlin can also be used to find the root (zero point) or minimum absolute value of a nonlinear expression. To use Nonlin in this fashion follow these steps: (1) Do not use any VARIABLE statements; (2) Use PARAMETER statements to specify the names and optional starting values for the parameters whose values are to NONLIN -- Nonlinear Regression Program Page 29 be determined as the roots or minimum value of the expression; (3) Use the FUNCTION statement to specify the expression whose roots or minimum value is to be found; do NOT specify a dependent variable and equal sign -- specify only the expression that is to be minimized; (4) Do not include any data records after the DATA statement; it simply signals the end of the command file and causes the analysis to begin. The following is an example command file to find the root of the expression SIN(X)-LOG(X): PARAMETER X FUNCTION SIN(X) - LOG(X) DATA Notice that the "variable" in the expression, X, is not declared to be a variable but rather a parameter. This example is included in the file MINSL.NLR which you can run. For this type of analysis, Nonlin determines the values of the parameters that minimize the absolute value of the expression. If the expression has a zero value (i.e., a root), that value is found since that is the smallest possible absolute value. If the expression does not have a zero point, Nonlin determines the values of the parameters that produce the smallest absolute value of the expression. For example, the expression 2*x^2-3*x+10 does not have a root but reaches a minimum value of 8.875 when x is 0.75. The MINPAROB.NLR command file contains this example. There are a number of cautions that you should keep in mind when using Nonlin to find roots or minimum values: 1. Nonlin will find only one root or minimum value per analysis. For example, the expression 9-x^2 has two roots: -3 and +3. Nonlin will find one of the roots; which one it finds depends on the starting value specified for X. 2. Nonlin will find only real roots, not complex. 3. If the expression contains a local minimum, Nonlin may find it rather than the global minimum or root. Of course, if you are looking for a local minimum in a certain region this could be considered a feature. For example, the expression 0.5*x^3+5*(x-2)^2+15 has a local minimum at x=1.61 and a root at x=-13.38. If the starting value of x is less than -8.3 the root is found; if the starting value is greater than -8.3, the local minimum is found. If the expression contains only a single variable, use the Mathplot program to graphically display the expression and determine a good starting value for the variable (see the end of this document for additional information about Mathplot). The SWEEP command can also be used to try multiple starting values when searching for a global minimum. NONLIN -- Nonlinear Regression Program Page 30 FUNCTION MINIMIZATION EXAMPLES MINFALL.NLR -- The time taken for an object to slide down a frictionless guide from position (0,h) to another position (d,0) (i.e., falling through a distance `h' while moving horizontally a distance `d') depends on the path that the object takes as it follows the guide. It turns out that the path that minimizes the descent time is not a straight line from (0,h) to (d,0) but rather a curve called a brachistochrone with a steeper slope near the beginning, that gives the object a chance to accelerate quickly, and then a shallower slope further on. Finding the shape of this curve is a classic problem in the branch of mathematics called the Calculus of Variations. The MINFALL example solves a simpler case of this problem: the object slides along a straight guide from (0,1000) to an intermediate position (px,py), and then along another straight guide from (px,py) to (1000,0). What point, (px,py), minimizes the descent time? [Note concerning the answer: The fall time for the object if it follows a straight guide from (0,1000) to (1000,0) is 2.0203 seconds; the fall time if it follows the two straight segments found by MINFALL is 1.8748; the fall time if it follows the ideal curved brachistochrone is 1.8590. The speed of the object at the end of the fall is the same regardless of the path taken (conservation of energy).] MINFUEL.NLR -- A lunar lander is hovering above the surface of the moon looking for a suitable landing site. Available fuel is critical and the desired site is 200 meters away. How long should the horizontal thruster be fired to start and stop the motion over the ground? The vertical thruster must be used continuously to keep the lander from being pulled to the surface. If too little horizontal thrust is used the spacecraft will move slowly and much fuel will be consumed by the vertical thruster counterbalancing the downward gravitational pull while hovering over the surface. On the other hand, if the horizontal thruster is fired for a a long time, the spacecraft will move quickly (minimizing the hovering time) but excessive fuel will be used during the horizontal acceleration and deceleration. MINFUEL.NLR determines how long the thruster should be fired during the start and stop accelerations such that the total fuel consumption (start thrust + stop thrust + hover) is minimized. NONLIN -- Nonlinear Regression Program Page 31 ACKNOWLEDGEMENT The nonlinear regression algorithm used by Nonlin was published in ACM Transactions on Mathematical Software 7,3 (Sept. 1981) "Dennis, J.E., Gay, D.M., and Welsch, R.E. -- An adaptive nonlinear least-squares algorithm." USE AND DISTRIBUTION OF NONLIN You are welcome to make copies of this program and pass them on to friends or post this program on bulletin boards or distribute it via disk catalog services provided the entire Nonlin distribution is included in its original, unmodified form. A distribution fee may be charged for the cost of the diskette, shipping and handling. However, Nonlin may not be sold, or incorporated in another product that is sold, without the permission of Phillip H. Sherrod. Vendors are encouraged to contact the author to get the most recent version of Nonlin. As a shareware product, you are granted a no-cost, trial period of 30 days during which you may evaluate Nonlin. If you find Nonlin to be useful, educational, and/or entertaining, and continue to use it beyond the 30 day trial period, you are required to compensate the author by sending the registration form printed at the end of this document (and in REGISTER.DOC) with the appropriate registration fee to help cover the development and support of Nonlin. In return for registering, you will be authorized to continue using Nonlin beyond the trial period and you will receive the most recent version of the program, a laser-printed, bound manual, and three months of support via telephone, mail, or CompuServe. Your registration fee will be refunded if you encounter a serious bug that cannot be corrected. The author frequently improves Nonlin and it is likely that the version you have is not the most recent version. Note, the cost of registering Nonlin is insignificant compared with what you would have to pay to purchase a commercial statistical package with an equivalent regression capability. This program is produced by a member of the Association of Shareware Professionals (ASP). ASP wants to make sure that the shareware principle works for you. If you are unable to resolve a shareware-related problem with an ASP member by contacting the member directly, ASP may be able to help. The ASP Ombudsman can help you resolve a dispute or problem with an ASP member, but does not provide technical support for members' products. Please write to the ASP Ombudsman at 545 Grover Road, Muskegon, MI 49442 or send a CompuServe message via CompuServe Mail to ASP Ombudsman 7007,3536. NONLIN -- Nonlinear Regression Program Page 32 You are welcome to contact the author: Phillip H. Sherrod 4410 Gerald Place Nashville, TN 37205-3806 USA 615-292-2881 (evenings) CompuServe: 71333,27 Both the Nonlin program and documentation are copyright (c) 1992 by Phillip H. Sherrod. You are not authorized to modify the program. "Nonlin" is a trademark. Disclaimer Nonlin is provided "as is" without warranty of any kind, either expressed or implied. This program may contain "bugs" and inaccuracies, and its results should not be assumed to be correct unless they are verified by independent means. The author assumes no responsibility for the use of Nonlin and will not be responsible for any damage resulting from its use. NONLIN -- Nonlinear Regression Program Page 33 M A T H P L O T Mathematical Function Plotting Program Special Offer If you like Nonlin, you should check out the Mathplot program by the same author. Mathplot allows you to specify complicated mathematical functions using ordinary algebraic expressions and immediately plot them. Four types of functions may be specified: cartesian (Y=f(X)); parametric cartesian (Y=f(T) and X=f(T)); polar (Radius=f(Angle)); and parametric polar (Radius=f(T) and Angle=f(T)). Up to four functions may be plotted simultaneously. Scaling is automatic. Options are available to control axis display and labeling as well as grid lines. Hard copy output may be generated as well as screen display. Mathplot is an ideal tool for engineers, scientists, math and science teachers, and anyone else who needs to quickly visualize mathematical functions. SPECIAL OFFER Registered users of Nonlin can order Mathplot for a special price of $18. Or, for an even better deal, if you register Nonlin and order Mathplot at the same time, you can get both for $40. NONLIN -- Nonlinear Regression Program Page 34 TSX-32 Multi-User Operating System If you have a need for a multi-user, multi-tasking operating system, you should look into TSX-32. TSX-32 is a full-featured, high performance, multi-user operating system for the 386 and 486 that provides both 32-bit and 16-bit program support. With facilities such as multitasking and multisessions, networking, virtual memory, background batch queues, data caching, file access control, real-time, and dial-in support, TSX-32 provides a solid environment for a wide range of applications. TSX-32 is not a limited, 16-bit, multi-DOS add-on. Rather, it is a complete 32-bit operating system which makes full use of the hardware's potential, including protected mode execution, virtual memory, and demand paging. TSX-32 sites range from small systems with 2-3 terminals to large installations with more than 64 terminals on a single 386. In addition to supporting most popular 16-bit DOS programs, TSX-32 also provides a 32-bit "flat" address space with both Phar Lap and DPMI compatible modes of execution. Since the DOS file structure is standard for TSX-32, you can directly read and write DOS disks. And, you can run DOS part of the time and TSX-32 the rest of the time on the same computer. TSX-32 allows each user to control up to 10 sessions. Programs can also "fork" subtasks for multi-threaded applications. The patented Adaptive Scheduling Algorithm provides consistently good response time under varying conditions. The TSX-32 network option provides industry standard TCP/IP networking through Ethernet and serial lines. Programs can access files on remote machines as easily as on their own machine. The SET HOST command allows a user on one machine to log onto another computer in the network. FTP, Telnet, and NFS are available for interoperability with other systems. TSX-32 allows simultaneous real-time program execution with normal time-sharing operations. Real-time programs can connect to interrupts, access device control ports, and preempt other operations when necessary. TSX-32 is an ideal process control or data collection system. TSX-32 is, quite simply, the best and most powerful operating system available for the 386 and 486. For additional information contact: S&H Computer Systems, Inc. 1027 17th Avenue South Nashville, TN 37212 USA 615-327-3670 (voice) 615-321-5929 (fax) CompuServe: 71333,27 NONLIN -- Nonlinear Regression Program Page 35 ===================================================================== Software Order Form ===================================================================== Name ______________________________________________________ Address ___________________________________________________ City _______________________ State _______ Zip ___________ Telephone _________________________________________________ CompuServe account (optional) _____________________________ Nonlin version ____________________________________________ Bulletin board where you found Nonlin _____________________ Comments __________________________________________________ Check the box below which indicates your order type: ___ I wish to register Nonlin ($25). ___ I wish to order Mathplot ($20). ___ I wish to register Nonlin and order Mathplot ($40). Add $5 to any amount shown above if the software is being shipped out of the United States. In return for registering, you will receive the most recent version of the program, a laser-printed, bound copy of the manual, and three months of telephone or CompuServe support. Your registration fee will be refunded if you find a serious bug that cannot be corrected. Distribution disk choice (check one): 3.50" HD (1.4 MB) ______ 5.25" HD (1.2 MB) ______ 5.25" DD (360 KB) ______ Send this form with the amount indicated to the author: Phillip H. Sherrod 4410 Gerald Place Nashville, TN 37205-3806 USA 615-292-2881 (evenings) CompuServe: 71333,27 ASP